Multimodal Emotion Perception: Analogous to Speech Processes
نویسنده
چکیده
The fuzzy logical model of perception (FLMP) has been successful in accounting for a variety of unimodal and multimodal aspects of speech perception. This same framework has been extended to account for the perception of emotion from the face and the voice. The FLMP accurately describes how perceivers evaluate and integrate these sources of information to determine the affect signaled by the talker. This same research falsifies emotion processing as following a specialized analysis such as holistic or categorical perception. 1. PERCEIVING EMOTION IN FACES Face recognition and the perception of facial expression are now being studied intensively by cognitive and neuro scientists. This field might be viewed as being at a stage similar to where the study of speech perception was about 2 decades ago. Only a handful of hard-nosed experimental psychologists have brought the phenomenon into the laboratory and subjected it to the disinterested scrutiny of empirical inquiry. Much of the previous literature has also been overburdened by a casual and less than well-informed application of evolutionary theory. In spite of our belief in universal rather than domain-specific processes, we acknowledge that the information sources available for emotion perception belong to a different family than those available for speech perception. Nonetheless, our successes (Massaro, 1998, Chapters 4 and 6) demonstrating common principles of information processing across various domains encourages us to expect that these principles will hold up in the emotion domain We operate under the assumption that multiple sources of information are also used to perceive a person's emotion. These consist of a variety of paralinguistic signals co-occurring with the verbal content of the speech. They may be aspects of voice quality, facial expression, and body language. In order to study how multiple paralinguistic sources of information are used, it is important first to define these sources. In our research, two sources of paralinguistic information, facial expressions and vocal cues, are chosen, to be analogous to the situation of bimodal speech. Baldi, our computer-animated talking head (see Figure 1), makes possible a set of quite realistic faces for research that are standardized and replicable, as well as controllable over a wide range of feature dimensions. Thus, it quickly became apparent that we could initiate a cottage industry in the study of facial and vocal cues to emotion. There was no shortage of literature on facial cues to emotion but we found a tremendous void in the domain of vocal cues. We learned that Baldi had to be given Figure 1 [http://mambo.ucsc.edu/psl/pela/wg.jpg] shows the talking head, called Baldi. As can be seen by the underlying wireframe model, there is not much behind his attractive
منابع مشابه
Music-aided affective interaction between human and service robot
This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis of the human brain mechanism. We propose a novel approach to identify human emotions in the perception system. The conventional approaches use speech and facial expressions as representativ...
متن کاملThe multisensory perception of co-speech gestures – A review and meta-analysis of neuroimaging studies
Co-speech gestures constitute a unique form of multimodal communication because here the hand movements are temporally synchronized and semantically integrated with speech. Recent neuroimaging studies indicate that the perception of co-speech gestures might engage a core set of frontal, temporal, and parietal areas. However, no study has compared the neural processes during perception of differ...
متن کاملMultimodal Emotion Recognition Integrating Affective Speech with Facial Expression
In recent years, emotion recognition has attracted extensive interest in signal processing, artificial intelligence and pattern recognition due to its potential applications to human-computer-interaction (HCI). Most previously published works in the field of emotion recognition devote to performing emotion recognition by using either affective speech or facial expression. However, Affective spe...
متن کاملEncoding and Decoding of Emotional Speech: a Cross-Cultural and Multimodal Study between Chinese and Japanese
Daily interactive communications involve the processing of verbal and nonverbal emotional cues from auditory and visual stimuli. The encoding and decoding schemes of emotional speech, however, have not been fully investigated in an interactive communication framework. Therefore, the present research is concerned with the encoding and decoding process of emotional speech during interactive commu...
متن کاملAudio and Video Reactive Talking Head Avatar
Human perception is multi-sensory. In particular, one often uses two of our five senses: Sight and Hearing. Sight or vision describes the ability to detect electromagnetic waves within the visible range (light) by the eye and the brain to interpret the image as sight. Hearing or audition is the sense of sound perception and results from tiny hair fibres in the inner ear detecting the motion of ...
متن کامل